An upper bound on prototype set size for condensed nearest neighbor
نویسنده
چکیده
The condensed nearest neighbor (CNN) algorithm is a heuristic for reducing the number of prototypical points stored by a nearest neighbor classifier, while keeping the classification rule given by the reduced prototypical set consistent with the full set. I present an upper bound on the number of prototypical points accumulated by CNN. The bound originates in a bound on the number of times the decision rule is updated during training in the multiclass perceptron algorithm, and thus is independent of training set size.
منابع مشابه
On trees attaining an upper bound on the total domination number
A total dominating set of a graph $G$ is a set $D$ of vertices of $G$ such that every vertex of $G$ has a neighbor in $D$. The total domination number of a graph $G$, denoted by $gamma_t(G)$, is~the minimum cardinality of a total dominating set of $G$. Chellali and Haynes [Total and paired-domination numbers of a tree, AKCE International ournal of Graphs and Combinatorics 1 (2004), 6...
متن کاملBootstrapping for efficient handwritten digit recognition
In this paper we present two algorithms for selecting prototypes from the given training data set. Here, we employ the bootstrap technique to preprocess the data. We compare the proposed algorithms with the condensed nearest-neighbor algorithm which is order dependent and a genetic-algorithm-based prototype selection scheme which is order independent. Algorithms proposed in this paper are found...
متن کاملA Cluster-Based Merging Strategy for Nearest Prototype Classifiers
A generalized prototype-based learning scheme founded on hierarchical clustering is proposed. The basic idea is to obtain a condensed nearest neighbor classification rule by replacing a group of prototypes by a representative while approximately keeping their original classification power. The algorithm improves and generalizes previous works by explicitly introducing the concept of cluster and...
متن کاملReducing the Response Time for Activity Recognition Through use of Prototype Generation Algorithms
The nearest neighbor approach is one of the most successfully deployed techniques used for sensor-based activity recognition. Nevertheless, this approach presents some disadvantages in relation to response time, noise sensitivity and high storage requirements. The response time and storage requirements are closely related to the data size. This notion of data size is an important issue in senso...
متن کاملPrototype Selection for Interpretable Classification By
Prototype methods seek a minimal subset of samples that can serve as a distillation or condensed view of a data set. As the size of modern data sets grows, being able to present a domain specialist with a short list of “representative” samples chosen from the data set is of increasing interpretative value. While much recent statistical research has been focused on producing sparse-in-the-variab...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1309.7676 شماره
صفحات -
تاریخ انتشار 2013